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Penalized regression methods aim to retrieve reliable predictors among a large set of putative 
ones from a limited amount of measurements. In particular, penalized regression with singular 
penalty functions is important for sparse reconstruction algorithms. For large-scale problems, these 
algorithms exhibit sharp phase transition boundaries where sparse retrieval breaks down. Large 
optimization problems associated with sparse reconstruction have been analyzed in the literature by 
setting up corresponding statistical mechanical models at a finite temperature. Using replica method 
for mean field approximation, and subsequently taking a zero temperature limit, this approach 
reproduces the algorithmic phase transition boundaries. Unfortunately, the replica trick and the non¬ 
trivial zero temperature limit obscure the underlying reasons for the failure of a sparse reconstruction 
algorithm, and of penalized regression methods, in general. In this paper, we employ the “cavity 
method” to give an alternative derivation of the mean field equations, working directly in the zero- 
temperature limit. This derivation provides insight into the origin of the different terms in the 
self-consistency conditions. The cavity method naturally involves a quantity, the average local 
susceptibility, whose behavior distinguishes different phases in this system. This susceptibility can 
be generalized for analysis of a broader class of sparse reconstruction algorithms. 
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I. INTRODUCTION 

In traditional statistics, we are given a sample of ran¬ 
dom observations much larger than the number of un¬ 
known parameters being estimated. However, during the 
last couple of decades, data collection has become much 
more automatic and much more extensive. As a result, 
the limit of large number of unknown parameters appears 
in a variety of fields, ranging from communication tech¬ 
nology to business informatics to systems biology, posing 
challenges to the classical statistical paradigm. Many 
methods for the selection of important variables, pro¬ 
posed within the statistics literature, are combinatorial 
in nature [1]. The explosion of number of possibilities, 
when the number of unknown variables are comparable to 
the sample size, places a huge computational burden on 
the more principled methods, severely limiting their ap¬ 
plicability to large data sets. Practical variable selection 
procedures drastically limit the search space, but they 
are often greedy in nature and potentially unreliable. 

Variable selection via penalized regression [2, 3] has 
gained wide appeal, owing to these concerns regarding 
combinatorial methods. One common setup is a linear 
model, y = Hxo + C> where it is assumed that H is a 
known M x N measurement matrix, Xo is an unknown 
IV-dimensional signal vector, possibly sparse, and the 
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vector ( models the measurement noise. The task is to 
reconstruct Xo from y and H, utilizing, in principle, the 
knowledge about the statistics of the signal and of noise. 

The general form of penalized methods for standard 
linear model, in practice, does not require detailed infor¬ 
mation about statistics of x 0 and (. The task reduces to 
minimizing the following objective function 

x(Acr 2 ) = argmin{- L ^(y-Hx) 2 + AV(x)}. (1) 

x 2(7 

where Act 2 is a non-negative parameter deciding relative 
weight between quadratic loss function and the penalty 
function V in Eq. (1). The penalty function V is chosen 
to suppress effects of noise on estimation as well as to reg¬ 
ularize potentially ‘soft’ directions in the loss function. 
A popular class of penalties are of the form T,a\ x a\ m - 
The choice m = 2 or ( 2 penalty is known as Ridge re¬ 
gression [4], a convex penalty function which has been 
shown to give more accurate predications, when sparsity 
of the solution is not important. Choosing m -*• 0 yields 
the so-called £q penalty which controls sparsity directly 
but leads to a non-convex optimization problem. Finally, 
in = 1 or t\ penalty, has the benefits of being a convex 
function as well as of promoting sparsity [2, 5]. 

When the vector x has K non-zero components, with 
K < M and M < TV, there are many rigorous results on 
the performance of sparse reconstruction based on £i- 
penalization [6, 7]. As a result of the computational 
attractiveness of convex optimization and of the per¬ 
formance guarantees which reach optimality under cer¬ 
tain conditions, (i-penalization has become a popular 
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method, particularly in the context of so-called Com¬ 
pressed Sensing [7]. In the asymptotic limit M, N, K -> 
oo, there is a phase transition in noiseless sparse recov¬ 
ery [8, 9], with constrained optimization of the £-\ penalty 
leading to perfect reconstruction in the ‘good’ phase. 

As one can imagine, the statistical physics of disor¬ 
dered systems offers powerful tools to understand this 
phase transition in the asymptotic limit. In a series of 
works using the replica method, several investigators have 
studied the performance of fi-penalized regression and 
the algorithmic phase transition [10-14]. In comparison 
to rigorous results providing guarantees [6], the statisti¬ 
cal mechanics approach enables a detailed analysis of the 
behavior of the optimization algorithms near the region 
of failure. On the other hand, the replica trick is ap¬ 
plied to a finite temperature system. The non-trivial zero 
temperature limit hides the essential role of local suscep¬ 
tibility in deciding the robustness of performance of the 
algorithms. Also, the derivation of the self-consistency 
conditions obscures the subtlety of certain aspects of this 
mean field theory. 

An alternative to the replica trick, with the number of 
replicas ‘mysteriously’ tending to zero, is to use the cavity 
method [15, 16]. It is a direct approach to mean field the¬ 
ory, originally designed for understanding the nature of 
ground states of certain spin glass models. The method 
has since been applied to a wider class of problems includ¬ 
ing algorithmic phase transitions, some examples being 
the satisfiability problem [17,18] and Hopfield neural net¬ 
works [19]. The cavity method leads to the same results 
as obtained by replica trick [16] for spin glass mean field 
theory and is closely related to the message-passing algo¬ 
rithm in graphical models [20]. We find that for the prob¬ 
lem at hand, the cavity method provides better insight in 
comparison to the replica formalism and has the poten¬ 
tial to lead to substantially better sparse reconstruction 
algorithms. 

The layout of this paper is as follows. In the next 
two sections, we briefly review a finite noise/finite tem¬ 
perature formulation of the problem and the replica ap¬ 
proach to the mean field theory, facilitating comparison 
with our analysis via the cavity method. Readers familiar 
with these results, or those solely interested in the cavity 
method, could skim through these sections just familiar¬ 
izing themselves with the notation. In Sec. IV, we intro¬ 
duce a susceptibility matrix associated with this prob¬ 
lem and then provide an outline of the derivation of the 
self-consistent mean field equations via a two-step cavity 
method working directly at zero temperature. This ap¬ 
proach is not only different from the replica method but 
also from the analysis based on iterations in a message¬ 
passing algorithm [21, 22]. We end by illustrating how 
our cavity mean field picture relate to success and to fail¬ 
ure of sparse reconstruction in medium sized ^-penalized 
problems in Sec. V. The appendix has additional tech¬ 
nical details of the derivation, along with the cavity ap¬ 
proach worked out for finite temperature. 


II. STATISTICAL MECHANICS 
FORMULATION 


Here, we set up the general framework for investigating 
the regularized least-squares reconstruction algorithms. 
We assume that the data y = Hxo + £ are generated 
by a probability distribution p(y|xo,77), given an (un¬ 
known) sparse signal Xo and a (known) matrix H, and 
an (unknown) Gaussian noise vector £ whose compo¬ 
nents are i.i.d. samples from A/”(0 ,ct?). The vector xo 
is considered to be a random sample from a distribution 

-Po(xo) = rLPo^ao)- 

Although, in general, the probability distribution of 
H, V(U), could be a non-Gaussian distribution, at this 
point we consider it to be a multivariate Gaussian dis¬ 
tribution. The element-wise mean and the covariance 
matrix entries are given by 


[H ia ] av = 0 ( 2 ) 


\Hi a Hjb] - mSijSab ■ (3) 

We study the performance of an estimator of Xo, 
namely the location x of the minimum of a cost func¬ 
tion 


£o(x) 


(y-Hx) 2 

2a 2 


+ U(x). 


( 4 ) 


We can reformulate the cost minimization as a statistical 
mechanics problem where the cost function will play the 
role of energy. We assume the penalty/potential term 
V (x) is such that there is a unique minimum of £o- Note 
that x = argmin a , £o( x ) depends on y,H, meaning that 
it can be written as a function x = g(xo,H,£), using 
the fact that y = Hxo + C- We have set up an ensemble 
of problem instances by specifying the probability distri¬ 
bution of the variables Xo,H,£, so that we could study 
the performance of the estimator over this ensemble. In 
particular, it will be useful to extract moments of the 
distribution of the parameter estimation error x- Xo- 
In order to make a connection between the optimiza¬ 
tions problem and statistical mechanics, one could choose 
a probability distribution of x parametrized by /3, playing 
the role of inverse temperature, 


Aa(x|y,H) = -exp( - fi£ 0 (x)) 


'Z(P, y,H) 


expj -P 


(y-Hx) 2 

2cr 2 


+ U(x) 


( 5 ) 


with the normalization factor Z = Z(/3,y,H), known as 
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the partition function, given by 


Z(0, y, H ) = f d N x^ex p( - /3E 0 (x)) 

= p|-is( (y " 2 g x) + r(x)j|. 

( 6 ) 


If we send /3 to oo, equivalent to sending the temperature 
to zero, the probability gets concentrated at the mini¬ 
mum of the cost/energy function. Keep in mind that we 
define /? to be dimensionless. 

We will consider averages of functions of the form 
0(x, Xo) containing both the original sparse signal and 
the variable related to the estimate. The average of 
the function 0(x, xo) over the distribution p^(x|y,H) 
is given by 


f d N xO(x,x o) exp{-ff (y / H 2 x)2 -f3V{x)} 

O(x,x 0 ) = -- V V „ * a ‘ - V . 

/^xexp{-/3^^-^(x)} 

( 7 ) 

This ‘thermal’ average, represented by < ••• >, depends on 
the random variables Xo, H and Q. Note that in the limit 
/? -*• oo, this average should become /(x, x 0 ), for contin¬ 
uous /. Averaging the result of this calculation over the 
random instances of Xo, H and £ is a technical challenge 
related to quenched averages in disordered systems [16]. 

The function 0(x, x 0 ) = A( x _ Xo ) 2 plays an important 
role in our analysis. Its average corresponds to the mean 
squared estimation error: 


i N I 

MSE s x i [<*» - *-> 2 ]:,h,c = ]v^ (x “ Xo)2) ]x V 0 ,h,c- 

(8) 

We will use [•"] to denote quenched averages, with the 
relevant quenched variables indicated in the subscript, 
when necessary. We use the notation u = x - xo to indi¬ 
cate the estimation error vector. The size of the vector u 
provides a measure of the inaccuracy of the reconstruc¬ 
tion. 

In the context of penalized regression, the penalty 
function is often chosen to be a sum of potentials in¬ 
volving single variables, namely, U(x) = T, a U(x a ). We 
will focus on V (x) of this nature. An important special 
case for example is in compressed sensing with sparsity 
promoting regularizing potential U(x) = A|x|. In this pa¬ 
per, we would mostly restrict ourselves to the noiseless 
case, C = 0, although the same methods could be used to 
analyze the noisy case as well. 

For £ = 0, we will be interested in the result of the con¬ 
strained optimization problem of minimizing V (x) sub¬ 
ject to the constraint y = Hx. In the M,N -» oo limit, 
this problem may exhibit a phase transition from a per¬ 
fect reconstruction phase to an error-prone phase, with 
the MSE, mentioned above, as the order parameter. This 
constrained optimization could be studied in more than 
one equivalent ways. After taking (j -*■ oo limit, we will 
take the route a -* 0 to enforce the equality y = Hx. 


III. REPLICA APPROACH 


In this section, we review the replica approach to the 
problem [13, 14], presenting the mean field equations in 
terms of a distribution of asymptotically independent 
single-variable problems with a set of self-consistency 
conditions. In order to calculate quantities like the 
MSE, we need to compute quenched averages of the form 
[(d(x,xo))] , which is complicated by the presence of 

the denominator in Eq. (7). Formally, the denominator is 
handled by introducing n non-interacting replicas of the 
system and taking n -* 0, as shown below. In the noise¬ 
less case, £o( x ) depends on x as well as on xo ,H. To 
emphasize those additional dependences, we write £o( x ) 
as £o(x m ,Xq,H) i n the next few equations. 


(O(x,x 0 ))x = 


/ d N xO(x, x 0 ) exp ( - /3£ 0 ( x ,x 0 , H)) 
/ d^xexp ( - /3 £q(x, x o, H)) 


n -1 


= lim I / d N xexp( - /3£ 0 ( x , x 0 ,H)) 

n-*-0 \ J v 7 

J (i Ar x£>(x,x 0 )exp(-/3£ 0 ( x , x o,H)) 

/ n 

e>( x i, x o) n {d Ar x /2 exp( -ft(%, x o,H))}. 

(9) 


n=l 


Averaging over the quenched variables xq and H , we get 


[(0(x,xo)) x ]” H 

f* n 

/ El x o) exp ( - /? Y, £o( x m> x o, H)) 

( 10 ) 


= lim 
n-> 0 


fi=l 


Using y = Hxo in the noiseless case, the energy function 
for the n-th replica would be 


£o( x ao x o,H) = 


2a 2 


+ e ( x aJ 


(Hxo-Hx^) 2 i T/ . Hu 2 


2a 2 


+ e ( x a*) = 


2a 2 


+ V(u t _ l + x 0 ), (11) 


rewritten in terms of the error variables = x^ - Xo, /x = 
Thus, we are interested in average quantities in 
the replicated ensemble whose partition functions is given 

by 


[^Ch 

s' Tt 71 ( T T \ 2 " 

/ II du v exp [ ~ P{ E "a +^(u A1 + x 0 )}] 

M=1 M=1 Jx 0 ,H 

( 12 ) 


av 

Xq,H 
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In order to average over 'P(H), the only quantity that 
needs to be computed is 


fixed. The saddle point Q = Q,R = -?R satisfies the 
conditions: 


ex p(- E A ( Hu m) 2 ) 

L M =1 /cr Jh 

= Y o /dHexp[ - yTr (H T H) - A £ ujH T Hu,] 

1 

where ifo is the normalization term for the Gaussian dis¬ 
tribution of H, and a = M/N is the sampling ratio. We 
can look at the <f> = IMA 1 where U is an N by n matrix, 
[u-| ,U2,---,u ;j ]. We write the singular value decomposi¬ 
tion (SVD) for U as 



U = UAV t (14) 

where A denotes the n by n matrix with n non-zero sin¬ 
gular values equals A /x . Therefore, the eigenvalues of <j> 
are simply the square of these singular values. Changing 
the variable from u in (13) to U, V, and A: 

det(I w + ~ 2 i«M T )-f[(l + -AiAj). (15) 

aa z N aa z N ^ 

The right hand side of Eq. (15) can be written as 
det (l„ + w ith the elements of the n x n matrix 

Q are defined by Q = jjlA J lA which have the same Aj) 
eigenvalues. Therefore, we can rewrite Eq. (12) as 

[^:,h 

n 

n du^ex p 

M= 1 

(16) 



- E-Trlog(l n + EEq) + E^( u m + x o) 


aa* 


u= 1 


Using the Fourier representation of the 6 function 

Siu^Uv-NQ^) = J dR^ex^-iR^iu^Uv-NQ^)) 

(17) 

and inserting this delta function with an integral over 
Qa,„ in Eq. (12), we get 


[Z n ]Z 0 H = f II dQ^vdR^v exp[-S'(Q, R)] (18) 

’ J fl<lA 

S[ Q, R] = y Trlog (l„ + -^Q) - iiVTr(RQ) 


a 


log 


n 

/ n^ ex p[-*EU u ^E%+ x o)] 

J 11=1 LL.IA H 


(19) 


Xo 


This integral over Q, R can be evaluated using the saddle 
point method [13, 14] when M,N -»■ oo, holding a =jd 


Qn* = —((u^u,,)} 


( 20 ) 


R=-E[i n + EEq ]- 1 

2a z aa z 


( 21 ) 


obtained by differentiating 5(Q,R) with respect to the 
elements of Q,R. The expectation ((u^u„)) depends on 
R via 


(( u ^w» = P 


dF\ R) 
dR„ v 


with exp{-/?F(R)} 


^ n 

/ n { rfVu /-} ex i> [ - e 

J fl= 1 IA,IA 


( 22 ) 




+ 



(23) 


If U (.t) is a convex function, we expect a unique state 
and a replica symmetric solution for Q,R. This implies 
= {Q-q)d^ + q and R. IIV = (R- r)S^ u + r. With that 
ansatz, 


/ 1 li« u /*l ex P L 

J U=1 


“AI • 

/A,1A 


f* n 

= / n{rfV ex p[-(^>-)E< 

J fi=i fi 

- r (E w) 2 - /3 E v ( u m + x o)] 

~/ ( 2 


ex p [ - =4- Z u A ;|-« T (E “/.) - uE ^("i. + x »)] 

ZC eff /a a eff A» M 

(24) 

/3 2 cr 2 

identifying R - r = 0 2 and r = - 4 € . We have used 

"eff "eff 


/ «p(-||)p x p[^^(£p i .)] 


*exp[?-r(Ep») 2 ] 


(25) 


to decouple the item replica coupling in the (Dai u /a) 2 
term, at the cost of introducing another quenched vari¬ 
able £. Note that we require R - r > 0 and r < 0 for this 
approach to work. These inequalities follow from (21) 
and from Q - q > 0 and q > 0. The conditions on Q and q 
would be obvious once we look at interpretation of these 
quantities described below. 

For V (x) = Y, a U(x a ) we can simplify further. Remem¬ 
bering that we also need to do the quenched average over 
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x 0 , 


^ n 

/ nK Y, v} ex p[ - pt , v ( u u +xi o)] 

J fi=l ii,v n 

r* n 

/ n {*%.}«* 

j M =i 


•/^i vVek -£ Tu aJ 

“"eff M 


E y ( u M + x o) 


J £,*0 


=n 


n 

J n {dU/ia} 

fl=l 


exp 


P\ 9 2 E( M /ia £.a' u /j,a) 

ZC eff M 


cr -»■ 0. A nontrivial aspect of the zero temperature limit 
(/? -*• oo) is the quantity /3AQ in Eq. (30) that behaves 
differently in different phases of reconstruction. Using 
Eq. (29), this quantity is just /3 times the thermal fluc¬ 
tuation in u. The fluctuation-dissipation relation [23] 
implies that this quantity may be interpreted as a local 
susceptibility. 

In our following considerations based on the zero tem¬ 
perature cavity method, we formally introduce a suscep¬ 
tibility and use its properties to give a more transpar¬ 
ent derivation of the same equations via two step cavity 
method. We will outline the main points of the deriva¬ 
tions in the next section, and refer the reader to Appen¬ 
dices A and B for the details. 


U (‘U^a + Xq a ) 


av 




(26) 


Thus, in the saddle point approximation, each of the N 
components of u become effectively independent and the 
saddle point conditions reduce to a self-consistent prob¬ 
lem for each component a = 1,..., N. Since this self- 
consistent problem is similar for each index, we suppress 
the subscript a in u^ a and in xo a - For each a, we have 
the integral of the form 


IV. OUTLINE OF THE CAVITY APPROACH 

The optimization problem associated with the regular¬ 
ized least-squared based reconstruction problem involves 
minimizing the energy function £o( x ) = ^ y ~^ 2 x ^ + V(x). 
For the noise free case, using y = Hxo, the energy to be 
optimized may be rewritten as 

£(u) = E u t h t HuW(u+x 0 ). (31) 
2a z 



du^ exp 



2a cS 


E( u m _ ^)+E u ( u ^+ x 0 ) 

a ii 


av 




The replica problem corresponds to a single variable u 
follows the effective distribution 

PeS (u 1*0 ,0 = ; (27) 

Z{x o,4) 

with an effective mean-field Hamiltonian 

£ e g(u; x 0 ,£) = y~ 2 ~ (“ 2 ~ 2 £ u ) + u ( u + x o) (28) 

which depends on two quenched variables Xo and £. 
The variable Xq has the probability distribution po(xo), 
whereas £ is distributed according to a Gaussian distri¬ 
bution with mean zero and variance <r|. The two pa¬ 
rameters and cr| are given by the following set of 
self-consistency conditions 


ulWtf, A Q S Q-g = [((«-(t t » 2 )]^ (29) 


2 2 
°eff = ° + 


/3A Q 

1 

a 



(30) 


where the thermal averages (•••} over u are performed 
in the P e g ensemble and the so-called quenched average 
[■"]^o,? over var i a bles £ and Xq. 

In order to study the regularized least-squares recon¬ 
struction, we need to take the limits /3 -* oo, and then 


where u = x - Xo- Note that, unlike the function £o(x), 
which is parametrized by known quantities (the data y 
and the measurement matrix H) and can therefore be 
empirically optimized with respect to its argument, the 
closely related function £(u) = £o(u + Xo) depends on the 
knowledge of the original signal Xq. The purpose of deal¬ 
ing with this function is not to provide an algorithm to 
estimate this signal given measured data, but to study 
the statistical behavior of this function and its minima 
over the distribution of problem instances, namely, input 
signals and the measurement matrices. For example, we 
can calculate the distribution of each component of the 
estimation error vector u, given the distributions of xo 
and H. We will be working with £ (it), although the sus¬ 
ceptibility for a particular problem instance, to be defined 
below, could be defined completely in terms of £o(x). 

In case this cost function reproduces the correct an¬ 
swer, the function £(u) minimizes at u = 0. Looking at 
the structure of £{u) near zero tells us about potential 
shallow directions in parameter estimation error space, 
along which the cost function does rise significantly to 
reign in error. This failure could be quantified in terms 
of a susceptibility to error under linear perturbations to 
the cost function. The susceptibility provides a measure 
of robustness of the estimated parameters, and could in¬ 
dicate the trustworthiness of the reconstructed solution. 

In addition, the cavity method derivation of the mean 
field equations involves considering the effect of altering 
a single variable or of imposing an additional data con¬ 
straint. These modifications would be treated as ‘small’ 
perturbations to the large-scale optimization problem. 
The susceptibility, especially the local part of it, would 
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play a central role in computing the effect of these per¬ 
turbations. 

Definition of Susceptibility: As usual, let us consider 
a general regularization function V (x) for which there is 
a unique minimum to the cost function. Let the minimum 
of £(u) be at u = u. We introduce an augmented cost 
function 

£(u; f) = — u t H t Hu + V(u + x 0 ) - f • u. (32) 
2(7“ 

with the variables f , which are conjugate to u. Optimiz¬ 
ing £ (u; f) will produce an f dependent answer u = u(f). 
For small f we expect 

u(f) = u + xf+ •" (33) 

defining the susceptibility matrix %. Note that we want 
to take M, TV -*■ oo first before we take the f ->■ 0, to define 
susceptibility. We also expect that 

min£(u; f) = £ (u) - u I f - f + •••• (34) 

u 2 

When H is a large random matrix, we can make 
asymptotic estimates of the mean and the variance of 
different components of the susceptibility matrix fol¬ 
lowing earlier work on singular values of random matri¬ 
ces [24, 25]. This is carried out in Appendix A. Note 
that only the diagonal terms \ aa have non-trivial means, 
whereas the off-diagonal terms average to zero. For di¬ 
agonal terms, namely local susceptibilities, the average 
over all a’s, 

X^£X°°, (35) 

is expected to be self averaging in the large M, TV limit 
and be independent of the H for a matrix chosen from 
the distribution 'P(H). 

One should note, \ ab i a + b, is an H, x 0 -dependent 
number of the order 1 /\/M for a particular choice of H 
and of Xo- It only vanishes after averaging over prob¬ 
lem instances. Moreover, even if these 0(l/\/M) terms 
are small compared to the 0(1) diagonal terms, the 
off-diagonal terms have an important effect on the self- 
consistency equations via the so-called Onsager reaction 
term [26], as we will see in Appendix B. 

Removing a Variable Node: For the ensuing discus¬ 
sion, it is useful to visualize the problem in terms of a 
bipartite graph (see Fig. 1), where the variables x a are 
represented by circular nodes and the ‘constraints’ aris¬ 
ing from each (namely, the terms ^ {iJi-Ha H ia x a ) 2 = 
^2 (T. a Hi a Ua) 2 in the cost function) are represented by 
squares. Had we stuck to a finite temperature descrip¬ 
tion, this graph would be the factor graph [27]. If we in¬ 
sist on satisfying the condition y = Hx, this graph could 
be thought of as a Tanner graph [28], with the circles be¬ 
ing the variable nodes and the squares being the ‘check’ 
nodes. The system with N variables (circles) and M data 



FIG. 1. Bipartite graph with variable nodes (circles) and 
constraint nodes (squares). 

constraints (squares) would be represented as the (TV, M ) 
system. Our task is to relate properties of the ( N,M ) 
system to (TV - 1 ,M) system and obtain self-consistency 
conditions based on quantities that converge in the ther¬ 
modynamic limit, TV, M -*■ oo. 

We pick a particular node a and partition the cost 
function 

£(u) = -4 u t H t Hu+ V(u + x 0 ) (36) 

2cH 

into a contribution purely from the node, a term rep¬ 
resenting the interaction of the node variable with the 
rest of the system, and, lastly, the cost function of the 
(TV - 1 , M) system: 

(u) — — + U (u a + Xq a ) + — U a \l a • \l b U b 

2fT b\a 

+ 7r^{Tj h bU b ) 2 + Y t u (u b +X ob ). (37) 

Z(T b\a b\a 

Here, the a-th column of the H matrix is being repre¬ 
sented by the vector h a , and, the subscript \a indicates 
that we leave out the node a. Moreover, we approximated 
\v 2 a by its average value 

M i 

[^]h = IMh = E - = 1, (38) 

i i= 1 1V1 

since is a sum of M terms and is self-averaging. The 
typical fluctuation of from its average value 1 asymp¬ 
totically vanishes as 0(l/\/M). 

The system without node ‘a’, i.e., the system with a 
‘cavity’ (see Fig. 2), will have its own optimum values 
u b = u bl for all b + a. The variable u a interacts with the 
rest of the system through the quantity h a • ff b \ a h b u b - 
The program of cavity method is to characterize the dis¬ 
tribution of this quantity in terms of some parameters 
relating to the (TV - 1, M) system, and then use the fact 
that node ^ a , is statistically the same as every other node 
to relate these parameters to the distribution of u a . 

Since we are looking for the ground state, we minimize 
the expression in Eq. (37). This optimization becomes 
equivalent to the minimization of the following system 
with the node variable, the Onsager reaction term [26] 
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FIG. 2. The (N - 1, M) cavity system. Node a has been FIG. 3. The (N - 1, M - 1) cavity system. Node a and con- 
removed from the system by removing the links to it. straint i have been removed from the system by removing the 

links to them. 


and a contribution from the system with the ‘cavity’: 
min £ (u) = min { ^u 2 a + U(u a + x 0a ) 

U u a Z(J A 

~ + ^ Uaha ' + ^ a(fi \ o)} - 

(39) 


The detailed of this calculation is given in Appendix B. 
The Onsager term appears as a reaction 

term toward the variable u a due to the adjustment of 
the other nodes after optimizing over them while holding 
u a fixed. The coefficient of u 2 turns out to be indepen¬ 
dent of a for large systems, because of self-averaging. We 
combine the u 2 terms in Eq. (39) to get 


minf (u) = min { 

U U a 


r, 2 Ua + 

Z(T eS 


F *^0 a ) 


+ -^u a h a • hf ,u b + f\ a (u\ a )} 

a b*a 


(40) 


with a 2 s = tr 2 +x/ a - 

If we did not have u a , the system would be optimized 
at u\ a . But now that u a is coupled to h a • Y,b*a hbUf,, one 
needs to characterize the distribution of this quantity. 
Let us define 

M 

Va ~ ~ La ' — — ^0 Hi a Vi (41) 

b±a i=i 


where t>i are the components of the residual vector v = 
Y.b±a hyttf,. Note that h a and v are independent vari¬ 
ables. Over the ensemble of problem instances, ?y a has a 
distribution which is expected to be Gaussian, given that 
it is a sum of many contributions. All we need to do is 
to calculate the mean and the variance of this variable, 
using the independence of Hi a ’s and Vj’s. 

Wx;,h = -E[5Jh['',£h = 0 

i 

- 4 IKCh- (42) 

Thus, all we need to know is the variance, [f, 2 ]” H , of 
individual residuals for the (N - 1 ,M) system. 


Removing a Constraint Node: An individual compo¬ 
nent Vi, being a sum of many variables, is expected to be 
Gaussian over the problem instance ensemble. The mean 
turns out to be zero but the variance requires a more 
careful analysis. Because the variables Hi b s and ft c ’s are 
strongly correlated, [ufJxo.H * T,b*a[H 2 b ]u[u 2 b )Z,H- To 
overcome this difficulty we need to replace u b s by vari¬ 
ables that are independent of the i-tli row of the H ma¬ 
trix. A similar problem arises when using cavity method 
in the context of Hopfield neural networks [19]. 

In order to find these quantities, we go one step fur¬ 
ther by removing the constraint ‘i’. The components of 
the error vector u' b in the (N - 1 ,M - 1) system is in¬ 
deed independent of the i-th row of the H matrix. What 
remains to be done is to relate D; to u b s. 

To find Vi = Hi b u b , we break up the minimization 

over u\ Q into two steps: 




mjn {f\ al (u\ a )| 

£ HibUb=Vi 
b±a 



(43) 

In Appendix B we show that optimization of Eq. (43) 
is equivalent to 

min {-^(vi - v'i) 2 + —v 2 }, with v[ = £ H ib u' b . (44) 
Vi 2 X 2 o 2 b ^ a 


We expect the first term to be minimized at Vi = v\, 
since minimization of the cost function £\ a ,(u\ Q ), with¬ 
out the i-th constraint, would be at u\ Q = u^ a , making 

Vi = T,b±a H ibU b = T, b ±a H ibU b = v\ at that point. The 
coefficient of (vi - v^) 2 depends only on \ and not on i, 
because of self-averaging once again. The second term 
forces the residual to be small, imposing correlations be¬ 
tween H ib s and u b s. We can capture the effect of such 
correlations by considering how Vi is optimized when 
both terms are present. 

Minimizing with respect to Vi gives us 

Vi = 1 Y £ H ib u b■ ( 45 ) 

The denominator (1 + -^) ‘scales down’ the uncon- 
strained answer Y,b±a Hib u b- ^ 1S the same factor that 
relates a 2 to a 2 s . 









N = 200, p - 0.2, a c « 0.51 


Given that this result is true for any z’s, Eq. (39) be¬ 
comes 

min£(u) = mm{-^-u 2 a - -^£ a u a + U(u a + x 0a )} (46) 

U “» 2 °eff a eS 

with 


(47) 

i bta 


being a random Gaussian variable with mean zero 
and variance cr| = h = The quantity q = 

jh ^,^K 2 ]"h is the MSE for the (N-1,M-1) 
system. Insisting that q is also the MSE of the ( N,M ) 
system is one of the self-consistency conditions. 

Cavity Method Self-consistent Equations: In sum¬ 
mary, the zero temperature problem boils down to op¬ 
timizing a collection of ‘independent’ variables 

u a (fa) = argmin {—^~(u 2 a - 2 £ a u a ) + U(u a + x 0a ) - f a u a j 
u a Za eS 

(48) 

using the same effective cost function as Eq. (28) in 
Sec. Ill, but with an additional linear perturbation. The 
variables £ a are chosen independently from A/”(0, <r|) and 
xq's are chosen independently from the probability dis¬ 
tribution Po(xo)- With Xoa,€a chosen randomly, we can 
obtain the distributions of u a ( 0) and y aa = dUa ^ a ' ) | 

The two parameters a 2 s and <r| are decided by the 
following set of self-consistency conditions: 

<?=[«»( 0) 2 Cj, X =[*“]£,£ (49) 

and 


CTeff = O' 2 + 


X 

a 



(50) 


So far, we have analyzed the = 0 case. The presence 
of additive noise can be handled easily by our method, 
with Vi = Y,b*a H ibUb + Ci and v i = Y.b±aHib'u' b + <». The 
effect of the additive noise could be absorbed in the £ a 
variable, with the new self-consistency condition for the 
variance <r| would being: 




(51) 


In this formulation, we do not need to invoke tem¬ 
perature. The average local susceptibility \ plays the 
role of the quantity /3AQ in the replica approach. In 
our subsequent work [29], we will use \ 1° distinguish 
phases around zero-temperature critical point, including 
the Donoho-Tanner transition [8]. 


V. MEAN FIELD THEORY AND FINITE SIZE 
4-PENALTY RECONSTRUCTION 

The mean field self-consistency equations derived 
above are for the thermodynamic limit, namely for 



/ (External field on each node) 


FIG. 4. Plot of the average error of the solution as a func¬ 
tion of external field ‘/’ on each node in two different regimes: 
green line being for perfect recovery regime and the red line 
corresponding error-prone one. The insets with correspond¬ 
ing colors are the responses of individual nodes, showing the 
staircase like behavior mentioned in the text. 


M,N -»■ oo. Many of the the quantities we define, like 
X and t7g ff (= a 2 + y/a), strictly make sense only after we 
take the problem size to infinity. How do these concepts 
arise in finite sized problems? To investigate this ques¬ 
tion, we look at the average local susceptibility \ f° r the 
important case of Basis Pursuit, which corresponds to 
U(x) = |]x|]i and to er -»■ 0. 

In particular, we carry out the numerical experiment 
for minimization of | |u + x 0 11 1 - fu a , for each a = 1,... ,IV, 
subject to Hu = 0, by linear programming using the 
CVXOPT package [30]. We obtain the M x TV matrix 
H by filling it with independent entries from a Gaussian 
distribution with mean zero and variance 1/M. In this 
example, the size of the vector x is N = 200, and contains 
K = 40 randomly placed elements driven from a standard 
Gaussian distribution. 

Since we solve a linear programming problem with the 
cost function perturbed by the linear term - fu a , solu¬ 
tions are chosen from the extreme points of a convex 
polytope. As / is changed, the solution, would jump from 
one extreme point to another at particular thresholds. As 
a result, functions u a {f) look like a set of staircases (see 
insets in Fig. 4). 

To see the average local susceptibility emerge in the 
thermodynamic limit from these step functions, we need 
to compute the average response. The average parame¬ 
ter estimation error, A u(f) = jf T, a ( u a(f ) - r«a(0)), as a 
function of external field ‘/’, shown for different regimes 
in Fig. 4, are strikingly different. In the high a solution, 
the average error has no response to the external field up 
to a large threshold. However, in the low a case, very 
small external fields can perturb the estimated solution 
to a new one. This is an indication of the robustness of 
the solution in the good recovery region and lack of it in 
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the error-prone regime, and hints at a phase transition 
in between. In this particular case, that transition takes 
place at a = a c « 0.51. 

To connect results shown in Fig. 4 to the average local 
susceptibility, we need to make linear fit to the average 
response A u(f) as a function of / near / = 0. Note that 
differentiating u a (f) with respect to / for the finite size 
system and then summing over a would give us a very 
different answer. In the good regime, the average local 
susceptibility, obtained by the fit, is approximately zero. 
As one decreases the number of constraints on the system 
past a threshold, this susceptibility becomes non-zero, as 
can be seen from the slope of the average response near 
zero, for the low a solution in Fig. 4. 

VI. AFTERWORD 

In this study, we directly treat the regularized least- 
squared optimization problem and show how to adapt the 
cavity method for doing mean field theory in the context. 
The mean field theory leads to a self-consistency condi¬ 
tion on average mean squared error (MSE), since error 
in estimating one variable affects error in others. Care¬ 
ful derivation of the self-consistency condition, without 
using replica trick, involve accounting for subtle correla¬ 
tions in the system. To take care of these correlations, we 
needed a two-step cavity approach: one step removing a 
variable and then, another, removing a data constraint. 

Although, we have emphasized the zero-temperature 
treatment, the cavity method can be used for finite tem¬ 
perature results as well. For completeness, we have pro¬ 
vided the corresponding derivation in Appendix C. The 
key connection with the zero-temperature treatment is 
via the fluctuation-dissipation theorem [23], which relates 
thermal fluctuation with susceptibility. 

The cavity approach looks at the behavior of the sys¬ 
tem for a particular choice of quenched variables, H and 
x 0 in this case. In contrast, the replica approach cen¬ 
ters on immediately averaging those quenched variables 
away. In the context of compressed sensing, one can 
imagine many problems, where the matrix H is non- 
random. Currently there is no obvious way to extend 
the replica mean field treatment for such sensing matri¬ 
ces. The cavity method could be a more versatile tool 
in this regard. Extensions of this tool to other classes of 
compressed sensing problems would be a goal of future 
studies. 


Appendix A: Susceptibility and Conjugate Fields 

We want to minimize the augmented cost function 

£(u;f) = — -u t H t Hu + V(u + xo) - f - u. (Al) 
2(7“ 

producing an f-dependent minimum u = u(f). The sus¬ 
ceptibility matrix x is defined by the small f expansion: 


u(f) = u(0) + %f + •••. If £ is differentiable, the optimum 
u(f) is the solution of 

f = V u £(u) (A2) 

Where f = 0, u is at its optimal value u. Formally, if 
perturbation f is small and £ is differentiable to higher 
orders, we can expect du to be small and, therefore, Tay¬ 
lor expand £(u + du) around u = u 

£(u + du) = £(u) + j- ^ SuaSu b d f +•••. (A3) 
2 ^ du a du b u=u 

From (A2) and (A3), we can identify the inverse suscep¬ 
tibility (X^)ab = d uju b \u-. and can show that 

min£(u; f) = £ (u) - u T f - -f T xf + •• (A4) 

u 2 

It is simplest to study the properties of Xi when the 
potential U ( x ) has continuous second derivatives. If we 
Taylor expand around the solution u = u, we will have 

1 ffTu 

£(u + du; f) =£ (u; 0) + -du T [—— + W(x)]du 

2 a 1 

- f • (u + du) + •••. (A5) 

where W a b( x) = U"(u a + xq a)d a b- Optimizing over du, 
we see that the susceptibility matrix would be given by 

X( X ,H) = + W(x)] -1 . (A6) 

o 

These statements are true, for fixed N, with f and du 
going to zero. However, we are interested in the opposite 
limit, N -*• oo first and then taking f, du small. We also 
need to deal with U(x ) that is singular. The way we will 
treat this difficulty is as follows. We will keep U(x ) to 
be smooth, take N -* oo limit on x defined by Eq. A6, 
with the assumption that Wo 0 (x)’s come from a well- 
defined distribution. In that case, when H is a large 
random matrix, we can make asymptotic estimates of 
the mean and the variance of different components of the 
susceptibility matrix Xi using diagrammatic expansions 
and resumming. 

To find the asymptotic behavior of Xi we formally ex¬ 
pand the RHS of Eq. (A6) in powers of 11 7 1 (see Fig. 5) 
and compute moments by averaging over Hi a diagram- 
matically. Namely, we expand 

x =w 1 - 4 w 1 H t HW 1 

a 2 

+ ^W^tfHW'tfHW 1 - - (A7) 

cr 4 V ' 

and then compute moments of the form 
[X aibl X a2b2 —X akbkH HCi"-Hiic,]n using Wick’s the¬ 
orem [31], since H distribution is Gaussian with mean 
and covariance specified by Eq. (2) and Eq. (3), respec¬ 
tively. One can write [x(x,H)]j^ as [W(x)-S(x)I Ar ]- 1 , 
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Xau = 


w-J 





a 


a i i b b j j a 




FIG. 5. The diagrammatic expansion of susceptibility. 



FIG. 7. The leading planar diagrams in covariance computa¬ 
tion are of the order 0(j as can be seen from counting 
a factor of M or N for appropriate index loop, and counting 
a factor of jj for each double-line contraction coming from 
averaging over the matrix elements. 



FIG. 6. Planar diagrams contributing to the self-energy. 


where £(x) is a self-energy term. Using the fact that, in 
the large M,N limit, only the planar diagrams survive, 
the contributions to the self-energy are shown in Fig. 6 
and can be re-summed as 


£(x) 


1 1 

a2 1+ jt^T,aX aa (^) 


(A8) 


Hence, the mean susceptibility (holding x 0 ’s fixed but 
averaging over H) is given by 


X av (x)E[ x (x,H)]- 


W(x) + 


M 

Ala 2 + Tr[x av (x)] 


IjvJ 

(A9) 


Covariance of * entries, [y a6 (x, H)y cd (x, H)]g could be 
computed using the diagrams in Fig. 7 and they are sup¬ 
pressed in the large M,N limit, since their contributions 
are 

Therefore, we get the local susceptibilities, i.e. diago¬ 
nal terms, to be: 


[x°°( x )]h = 


W aa (x a ) + 


a 2 + XM 


x(x)-E[r(x)g. 


(A10) 

(All) 


Here, [x° 6 ]h = 0 for a ± b, but each \ ab is of order 
0(1 /n/M). In particular, we will need the correlation 
of X ab with the corresponding matrix elements of H T H. 


Using the identity [W + H g2 H ]x = Ijv> we can prove a 
useful corollary of the result in Eq. (A9). 


4[H t H X (x,H)]^ = 
a- 

= Ijv - W(x)[W(x) + 


I JV -W(x) X av (x) 

M 

Ala 2 + Tr[x av (x)] 


Ijv] 1 


M 

Ala 2 + Tr[x av (x)] 
M X av (x) 


[W + 


M 

Ala 2 + Tr[x av (x)] 


Ijv] 1 


Ala 2 + Tr[x av (x)] 
QX av (x) 
aa 2 + x(x) 


(A12) 


In particular, Eq. (A12) implies 


[Tr(H T H X (x, H))]£ = M 2 g ' X _ (x) (A13) 

aa 2 + x(x) 

which we will use this identity in Appendix B. 

Notice that many observations made here are inde¬ 
pendent of the assumption that U(x) has a continu¬ 
ous second derivative. For example, in the case of 
compressed sensing with U(x ) = A|a’|, we could de¬ 
fine a second-differentiable function U e (x) such that 
lim £ ^o U € {x) = U(x), for example, U e (x) = Vx 2 + e 2 or 
U e (x) = - log(2cosh(ex)). If x a = x 0a + u a goes to zero 
as e vanishes, then the corresponding IU aa = U"{x a ) di¬ 
verges. However, the corresponding local susceptibility, 
X aa , just becomes zero in this limit. Therefore, as e -» 0, 
the idea of using effective single variable optimization 
problems and determining the self-consistent distribution 
of x a and x“ a remains meaningful. We just need to sep¬ 
arate out the set of variables x a for which W aa diverges 
and treat this set carefully. As a consequence of x re¬ 
maining well-defined in the e -» 0 limit. 
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Appendix B: Zero-temperature Cavity Method 
1. Removing a Variable Node 

We expand the following equation in terms of node ‘a’ 
£(u) = -Eu T H T Hu + U(u + x 0 ) (Bl) 


The quantity E6,c#o(hfc)i(h c ) J x^, is independent of h a . 
As a result, £ bjC#a (h a -h fc )(h a -h c ) X ^ can be replaced by 
£ bc#a (h b ) • (h c)X\a/M, thanks to the self-averaging of 
(h a ), (h a )j. Using Eq. (A13) for the (IV- 1 ,M) system, 


E h 6 -h cX ^ 

b,cta 


M(j 2 X\a 
a<j2 + X\a 


Ma 2 x 
aa 2 + x 


(B8) 


which results in 


£(u) 


~ rt 9 ^a ^"t Xq a) 9 ' E] ^b^b 

2(7 b\a 

+ ^(E^) 2 + E^ + *06) (B2) 

Za b\a b\a 


The system with a cavity (at node ‘a’) has a new op¬ 
timum values Ub = Ub, for all b + a. In the complete 
system, the variable from node ‘a’, namely u a , interacts 
with the rest of the variable via the term h a • Y.b\a hfcUf,. 
We rewrite Eq. (B2) representing the the interaction be¬ 
tween the node and the rest as a perturbation by a field: 

£(u) = 7^ U l +U(u a + X 0a ) + £\a(u\a) " U \a f \a - (B3) 

The cost function of the (IV - 1 ,M) system is £\ a (u\ a ). 
We identify (f\ a )b = -yjhfc • h a u a to be the local force 
exerting on each node Ub, due to presence of node u a - 
Since we are looking for the ground state, we minimize 
the expression in Eq. (B3) 


min£ (u) 

U 


min {oT u « + % 

«o,U\a 2,(7 ^ 


+ XOa) 


+ £\a( U \a) -U^ a f\ a }. 


(B4) 


Given that h b -h a is of the order 1 /\/M, f\ a is small, and 
we can invoke the definition of susceptibility \\a for the 
(N - 1 ,M) system and use expansion of the minimized 
cost function (A4). 

min£(u) = min{— 1 —u 2 a + U{u a + zoa) 

U U a Z(J z 

+ ^\a(u\ a ) - U^ a f\ a - ^f\aX\af\a} ( B 5) 
and plugging in (f\ a ) b = h b • h a u a , we get 
min £(u) = min { — -^-u 2 a + U(u a + x 0a ) 

u u a 2crf ff 

+ — U a h a • E h bU b + £\a(u\o)} ( B6 ) 

a b±a 

where 

4 = 4 ( 1 - 4 E (ha • hfc)(h a • h c ) X M (B7) 

a cS a \ b,c*a ) 


with the last step having to do with \ becoming inde¬ 
pendent of N, M asymptotically. Using this last relation 
Eq. (B8), in Eq. (B7) we get 

a 2 ff = a 2 + * (B9) 

a 

Looking at Eq. (B6), we still have to determine the size 
of the coupling of the node variable u a to the rest of the 
system via r] a = -h a • v with v = Y,b±a h b w b . In Section 
IV, we showed that the first and the second moments of 
the rj a are: 


[Va]Z, H =0 

tt,H=^E[^]xo,H- (BIO) 

The order k cumulants go as M 1 ~ k ^ 2 and, for k > 2, they 
tend to zero as M goes to infinity. Therefore, we will stop 
with the variance and treat ?y a as a zero-mean Gaussian 
variable. We still need the variance, for which we need a 
condition to determine H . This requires a second 

step of the cavity method. 


2. Removing a Constraint Node 


The subtlety in determining [D?]^ v H involves account¬ 
ing for correlation between matrix elements Hib and the 
optimal values Ub of the (IV- 1 ,M) system. To do this, 
we need to set up an (IV- 1, M— 1) system with the con¬ 
straint l i’ removed (see Fig. 3). To find - X! Hib^b, we 

b±a 

write the minimization over u\ Q as follows: 




min {£\ai(u\ 0 )} 

E H ib u h =Vi 

b*CL 


+ 2^ 


(BH) 

with the first minimization being a constrained one for 
the (IV- 1, M- 1) system, subject to Y,b±a Hib'Ub = W and 
the second one being over v l . The cost function for the 
system without nodes a, i is represented by £ (u\ a )\^. The 
term - j^v 2 represents the constraint coming from the i- 
th observation. Had we done an unconstrained optimiza¬ 
tion of £\ 0 ,;(u\ 0 ), the optimum u\ Q would be independent 
of Hib- Trying to keep ty small perturbs this solution by 
a small amount and induces correlation with Hib■ Our 
strategy would be to compute the effect of perturbation 
in terms of the system susceptibility. 
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In order to do constrained minimization, we use the 
Lagrange multiplier method 

min£\ a (u\ a ) = max min {£\ oi (u\ a ) 

U\ a 7 i U \ a ,Vi 

+ 7^ v i - 7 i(.Vi - £ HibUb)}- (B12) 

ACJ bta 


with 

(B22) 

i bta 

being a random Gaussian variable with mean zero and 
variance 


Minimizing Eq. (B12) with respect to Vi we get Vi = 
cr 2 7 i, and making that substitution for ty into the cost 
function we get 

nrin£\ a (u\ a ) =maxmin{£\ 0 (u\ 0 ) - \a 2 7 ? - u£g} 

U\ a 7i u \a Z ' 

(B13) 

=max {-h 2 7* 2 + £\*(g)} (B14) 

with g b = and with £* (g) is defined as 

£\i(g) = min {£\ ai (u\ 0 ) - u^ a g} (B15) 

where the presence of g alters the optimal u\ Q from the 
unconstrained optimum Uy o . Since each component of g 

is small ( 0 (l/\/M)), we can expand around u^ o using 
X\ai, the susceptibility of the (TV - 1 ,M - 1) system, as 
in Eq. (A4). Therefore, £*(g) can be written as 

£\i(s) = Ai( u \«) - u \Is - ^g T *V:g + - (B16) 

Now, Eq. (B14) becomes 

min£\ a (u\ a ) = min{-h 2 7 2 +£: w (u\ a )-u^g-ig T x\ al :g} 

( B17 ) 

The quadratic term g r x\ 0 ,g = 7 i'Lij H ib H ic x^ ai can 
be simplified because of self-averaging. We have 

£[iM c ]H v xw = -Jv £x& - h (bis) 

ij ^ b ^ 



[? 2 ] 


av 

Xq,H 


Y[HiaH ja ]% £ [H ib H jc ]% K<]x V o,h 

i,j b,cta 




£ 

b,cta 


$ij & be 

M 


K«cl 


av 

x 0 .H 


1 

M 


E r t2 -1 av 

K Jx 0 ,H 

bta 


Q 

a 


(B23) 


thanks to Hj a and Hi b being independent for a + b, as 
well as u ' b s being independent of those matrix elements. 
Here the quantity q = Y [w ,2 ]xq h th e MSE for 

b,cta 

the (TV - 1 ,M- 1) system. Taking into consider that q is 
a self-averaging quantity and is asymptotically the same 
for the (TV, M) system, we get the independent single 
variable optimization 

min {^4“( u o “ 2 (« u «) +U(u a + x 0a )} (B24) 

with £ a e Af( 0 , <j|) and cr 2 ff = a 2 + x/a. 


Appendix C: Finite Temperature Cavity Method 

In this section, we solve the finite temperature problem 
formulated in Sec. II via the cavity method. With the 
cost function written in terms of u as 

£(u) = ^(Hu) 2 + V(u + x 0 ) (Cl) 

we define the Boltzmann distribution P(u|H,x 0 ): 


once more using the fact that the average local suscep¬ 
tibility x i s nearly the same for the (TV, M) system and 
the (TV - 1, M - 1) system. 

Putting everything together 


P(u|H,x 0 ) 


1 

Z(/3|H,x 0 ) 


e 




(C2) 


with the normalization factor/partition function given by 


min£\a(u\ 0 ) = max {-y— (1 + -^ 7)7 2 + 7 i £ H ib u b }, 
u \a n 2 acr 2 

(B19) 

maximizing with respect to 7 \ and then using Vi = cr 2 ^i 
gives us 

Vi = —V £ H ibU ' b . (B20) 

1 H- 2 bta 

olg z 

Since this result is true for any i’s, Eq. (B 6 ) becomes 

mm f (u) = mm { ^-u 2 a - —L^- u a + U(u a + x 0a ) } 

(B21) 


Z(/3 |H,x 0 ) = J due~ 0£ (C3) 

We now apply the first step of the two-step cavity 
method. First, we rewrite £ as an interaction between 
variable u a and the rest of the variables 

£(u) = ^h 2 Ua+-^U a lv£ h b U b + U(u a +X 0 a )+£\a(u\ a ) 
a b*a 

(C4) 

By defining 

Bo * Ybj=a BfeUfr 

Va = ~ 


h 2 


(C5) 
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and using h 2 = 1 + 0(- 7 =) we have 

£= ^(Wa - 2u a r? a ) +c/(u a + XOo) + £\ a (u\a) (C6) 

with u\ a , £\ a etc defined as in Sec. IV. Equation (C6) in¬ 
dicates that the variable u a interacts with all the others 
only through r] a . Therefore, we rewrite the marginal dis¬ 
tribution P(u a ) as an integral over the joint distribution 
of r] a and u a , P(u a ,r] a ). 


then we arrive at 

(Va)\a = - YjHiaiVi) (C15) 

i 

and 

($vl)\a = E H ia H ja {8vi5vj) 
ij 

* Z Jj S ij( Sv i Sv j) = Z(H 2 > ( C16 ) 

IJ % 


P(u a ) = J du\ a e 0£ = J dr] a P(u a ,r] a ) (C7) 
where 

P(u a ,Va) = 4 f du \a d (Va + h a • Z h b U &) e ^ ( C8 ) 

Z J b±a 

for all a = 1,...,TV. Now we introduce a cavity ‘field’ 
distribution of r/ a at the removed node a as 

P\a(Va) = f du 6(iia + h a ■ £ h b u b )e~ d£ ^. (C9) 

"\a b±a 

By comparing (C8) and (C9), we get 

p, / drj a exp[-/j{ ( " ‘ +U(u a +x 0 a)}]P\a(Va) 

f du a dr) a ,ex p[—/g{ ( '' a na) +t/(ii„+i 0 .)}]f\.b.) 

(C1 °) 

The assumption of continuity of the global ground 
state, even in the presence of the cavity after removing 
node a, is equivalent to the replica symmetric hypothesis. 
This is a valid assumption when the penalty function V 
is convex. Therefore, in the limit of TV -» oo, even if the 
nodes of (TV - 1 ,M) system are weakly correlated, rj a is 
still a sum of many variables and P(r) a )\ a can well be 
approximated by a Gaussian distribution. 


Having done that we need to compute (vi) and (5vj). To 
do so, this time in addition to site a we exclude site i. 
Hence from (C6) we get 

£\a( U \a) = ^2 V i + £ w( U \a) ( C17 ) 

After carrying out the same computation as in (C7), 
(C8), and (CIO) for the marginal distribution Q\ a (vi), 
we arrive at 


eX P{~ } 

/ dv i ex P { - 4* v i - (t 2(£fc } } 


Q\a(Vi) = 


(CIS) 


Therefore 


Q\a(Vi ) = 


ex P - 2^( 1 + 




(vj)\ a j 


f<(Svf)y a 


f dvi exp | - 2^2 (1 + 
and then (8vf) is 

(<K 2 > = 




Vi - 


(Vi)\ai 


(Svh\a 


M 1 + 


P(^ V i)\ai 




(C19) 

(C20) 


and (vi) is at 


(Va~(Va)\ a ) 2 

P\a(Va) oc e~ (Cll) 

Then (CIO) becomes 

p, , _ eX P{-^2( 1 -E<‘ 5l? a>\a)“a + ^Ca>\a-/3^(Ma + a:0a)} 

° / duaexp{-£z(l--2z(6ril)\a)ul+^(ri a )\ a -pU(u a +xo a )} 

(Cl 2 ) 

Therefore, only the thermal averages (??a}\a and the ther¬ 
mal fluctuation strength {8r]l)\ a = (( Va-(Va)\a) 2 )\a of the 
field r] a for the distribution d\ a (r) a ) are left to be com¬ 
puted. In that process the effects of (weak) correlation 
between the u a ’s have to be accounted for. Define, as in 
Sec. IV, 




{Vi)\ai 

rr A 


(C21) 


Notice how both these moments for the (TV- 1 ,M) sys¬ 
tem is scaled down by the same factor, when compared to 
the moments for the (TV - 1 ,M - 1) system. Using argu¬ 
ments similar to the fluctuation-dissipation [23] theorem, 
we could show that the change in ( m) due a change in 
( Vi)\ai , susceptibility of sorts, is closely related to {Svf)^ 
times {Svf), with the first term of the product playing the 
role of temperature. 

Carrying on, we get 

{dv.i }\ai — ddibdPic{dVb5u c )\ a i 


Vi = Z HibUb 
b±a 


and utilize our definition, 

r?a = -E PiaVi 

i 


(C13) 

b,c±a 


= E 


fe,c^a 


l 

(C14) 

M 


1 „ TV 

—8 bc (Su b 8u c )\ ai + 0(j ^- 3 / 2 , 

E 

bta 


TV 1 / 2 

M ’ 

(C22) 
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since the (N- 1, M— 1) system, indicated by the subscript 
l \ai\ is independent of Hp, and i?i C , Hn,Hi C = M ^ bc + 
O(jj) fluctuations, and (5u b 6u c )\ ai ~ 0(^=, ^=) when 
b + c, indicating that nodes are only weakly correlated. 

To make connection with the notation in Sec. Ill, let 
us introduce A Q 

A <Q = ^ E(^a> - E (Sni\ ai , (C23) 

JV a iv 1 


the second approximate equality becoming exact in the 
thermodynamic limit. Then, we have 


= A Q/a. 

Therefore from (C16), (C20), and (C23) 
P /r 2 \ 1 

Sr la)\a = 


(1 + 


) 


pAQ/ot 

and from (C15), (C21), and (C23) 

Si H-io. Hib{ u b)\ 


{Va)\a = 


(1 + 2AQ) 

V acr^ / 

Moreover, we define 

£a = ^ Hii,(ub)\ a i 

i bta 

which has variance cr| = g/a with q 

< 1 = ]^XK> 2 18 jv7[EW\. 


(C24) 


(C25) 


(C26) 


(C27) 


(C28) 


being the mean squared error. Therefore, by plugging 
(C26) and (C27) into Eq. (C12), the marginal distribu¬ 
tion for single variable u a becomes 

ex P{ _ 2 ^-( u a - 2u a £ a ) - pU(x 0a + u a )} 

P(u a ) =_ — _ 

/ du a exp{-^s-(ul - 2 u a £ a ) - PU(x 0a + u a )} 

(C29) 

with (jg ff = <r 2 (l + )■ and the effective cost function 

for the individual node is 

£ (Ua) = ( U a - 2u a^a) + U ( X 0a + U a ) (C30) 


Therefore, with £ replaced by a set of effectively de¬ 
coupled nodes, and the sum over index a replaced by 
a quenched average over £, a ,Xo a . As a result, the self- 
consistency conditions for the MSE 


9 = 


1 

N 


N 


EM 2 


and for 

1 N 

AQ=-Y l (Sul) 

a =1 

reduce to 


9 = 



and 


(C31) 


(C32) 


(C33) 


AQ=[(fe 2 ) eff ]^ o (C34) 


where the thermal average (.. .} c ff is performed with re¬ 
spect to the effective individual node distribution (27) 
and [...] is the quenched average over variables 
£,Xo, with £ drawn from Af(0,q/a) and signal xq drawn 
independently from a distribution P(x o). These self- 
consistency equations are exactly the same those from 
the replica symmetric ansatz in Sec. III. 
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